differentiable neural architecture search
Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement
Recent works on One-Shot Neural Architecture Search (NAS) mostly adopt a bilevel optimization scheme to alternatively optimize the supernet weights and architecture parameters after relaxing the discrete search space into a differentiable space. However, the non-negligible incongruence in their relaxation methods is hard to guarantee the differentiable optimization in the continuous space is equivalent to the optimization in the discrete space. Differently, this paper utilizes a variational graph autoencoder to injectively transform the discrete architecture space into an equivalently continuous latent space, to resolve the incongruence. A probabilistic exploration enhancement method is accordingly devised to encourage intelligent exploration during the architecture search in the latent space, to avoid local optimal in architecture search. As the catastrophic forgetting in differentiable One-Shot NAS deteriorates supernet predictive ability and makes the bilevel optimization inefficient, this paper further proposes an architecture complementation method to relieve this deficiency. We analyze the effectiveness of the proposed method, and a series of experiments have been conducted to compare the proposed method with state-of-the-art One-Shot NAS methods.
Data Aware Differentiable Neural Architecture Search for Tiny Keyword Spotting Applications
Shi, Yujia, Njor, Emil, Martínez-Nuevo, Pablo, Shepstone, Sven Ewan, Fafoutis, Xenofon
The success of Machine Learning is increasingly tempered by its significant resource footprint, driving interest in efficient paradigms like TinyML. However, the inherent complexity of designing TinyML systems hampers their broad adoption. To reduce this complexity, we introduce "Data Aware Differentiable Neural Architecture Search". Unlike conventional Differentiable Neural Architecture Search, our approach expands the search space to include data configuration parameters alongside architectural choices. This enables Data Aware Differentiable Neural Architecture Search to co-optimize model architecture and input data characteristics, effectively balancing resource usage and system performance for TinyML applications. Initial results on keyword spotting demonstrate that this novel approach to TinyML system design can generate lean but highly accurate systems.
Review for NeurIPS paper: Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement
Weaknesses: The paper is not very novel or significant in its contribution. It compiles two regularization methods to mitigate two long-standing problems in differentiable NAS, however, the proposed methods are not very novel. NAS-Bench is not a very well established benchmark that not many people are very familiar with. It is not fair to compare with existing work on NAS-bench, as most of them were not optimized on NAS-Bench. For instance, the DARTS work may work equally well with proper hyperparameter tuning and regularization. With the existing DARTS hyperparmeters, search on NAS-bench converges to networks with only identity/skip operation.
Review for NeurIPS paper: Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement
The reviewers generally found this paper to be a good contribution to the NAO/NAS field, with a good motivation and strong results. There were concerns on the novelty of the work, but after considering the author's response, particularly in relation to EWC, I think the work is sufficiently novel, especially given the relatively new domain. I would encourage the authors to include the clarifications and comparison to related work from the rebuttal in the main paper. The biggest issue that still lingers is the fact that NAS-Bench-201 is a very small benchmark. The most positive reviewer strongly encourages the authors to apply their technique to a larger benchmark such as NAS-Bench-1Shot1.
Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement
Recent works on One-Shot Neural Architecture Search (NAS) mostly adopt a bilevel optimization scheme to alternatively optimize the supernet weights and architecture parameters after relaxing the discrete search space into a differentiable space. However, the non-negligible incongruence in their relaxation methods is hard to guarantee the differentiable optimization in the continuous space is equivalent to the optimization in the discrete space. Differently, this paper utilizes a variational graph autoencoder to injectively transform the discrete architecture space into an equivalently continuous latent space, to resolve the incongruence. A probabilistic exploration enhancement method is accordingly devised to encourage intelligent exploration during the architecture search in the latent space, to avoid local optimal in architecture search. As the catastrophic forgetting in differentiable One-Shot NAS deteriorates supernet predictive ability and makes the bilevel optimization inefficient, this paper further proposes an architecture complementation method to relieve this deficiency. We analyze the effectiveness of the proposed method, and a series of experiments have been conducted to compare the proposed method with state-of-the-art One-Shot NAS methods.
On Constrained Optimization in Differentiable Neural Architecture Search
Maile, Kaitlin, Lecarpentier, Erwan, Luga, Hervé, Wilson, Dennis G.
Differentiable Architecture Search (DARTS) is a recently proposed neural architecture search (NAS) method based on a differentiable relaxation. Due to its success, numerous variants analyzing and improving parts of the DARTS framework have recently been proposed. By considering the problem as a constrained bilevel optimization, we propose and analyze three improvements to architectural weight competition, update scheduling, and regularization towards discretization. First, we introduce a new approach to the activation of architecture weights, which prevents confounding competition within an edge and allows for fair comparison across edges to aid in discretization. Next, we propose a dynamic schedule based on per-minibatch network information to make architecture updates more informed. Finally, we consider two regularizations, based on proximity to discretization and the Alternating Directions Method of Multipliers (ADMM) algorithm, to promote early discretization. Our results show that this new activation scheme reduces final architecture size and the regularizations improve reliability in search results while maintaining comparable performance to state-of-the-art in NAS, especially when used with our new dynamic informed schedule.